AITopics | normalized loss

Collaborating Authors

normalized loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning

Balef, Amir Rezaei, Vernade, Claire, Eggensperger, Katharina

arXiv.org Artificial IntelligenceNov-20-2025

The Combined Algorithm Selection and Hyperparameter optimization (CASH) is a challenging resource allocation problem in the field of AutoML. We propose MaxUCB, a max k-armed bandit method to trade off exploring different model classes and conducting hyperparameter optimization. MaxUCB is specifically designed for the light-tailed and bounded reward distributions arising in this setting and, thus, provides an efficient alternative compared to classic max k-armed bandit methods assuming heavy-tailed reward distributions. We theoretically and empirically evaluate our method on four standard AutoML benchmarks, demonstrating superior performance over prior approaches. We make our code and data available at https://github.com/amirbalef/CASH_with_Bandits

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.05226

Country: Europe > Germany (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks

Peleg, Amit, Hein, Matthias

arXiv.org Artificial IntelligenceJul-4-2024

The implicit bias Neural networks typically generalize well when of Stochastic Gradient Descent (SGD) is thus often thought fitting the data perfectly, even though they are to be the main reason behind generalization (Arora et al., heavily overparameterized. Many factors have 2019; Shah et al., 2020). A recent thought-provoking study been pointed out as the reason for this phenomenon, by Chiang et al. (2023) suggests the idea of the volume hypothesis including an implicit bias of stochastic for generalization: well-generalizing basins of the gradient descent (SGD) and a possible simplicity loss occupy a significantly larger volume in the weight space bias arising from the neural network architecture. of neural networks than basins that do not generalize well. The goal of this paper is to disentangle the factors They argue that the generalization performance of neural that influence generalization stemming from optimization networks is primarily a bias of the architecture and that the and architectural choices by studying implicit bias of SGD is only a secondary effect. To this end, random and SGD-optimized networks that achieve they randomly sample networks that achieve zero training zero training error. We experimentally show, in error (which they term Guess and Check (G&C)) and argue the low sample regime, that overparameterization that the generalization performance of these networks is in terms of increasing width is beneficial for generalization, qualitatively similar to networks found by SGD. and this benefit is due to the bias of SGD and not due to an architectural bias. In contrast, In this work, we revisit the approach of Chiang et al. (2023) for increasing depth, overparameterization and study it in detail to disentangle the effects of implicit is detrimental for generalization, but random and bias of SGD from a potential bias of the choice of architecture. SGD-optimized networks behave similarly, so this As we have to compare to randomly sampled neural can be attributed to an architectural bias.

accuracy, initialization, normalized loss, (15 more...)

arXiv.org Artificial Intelligence

2407.03848

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Measuring Feature Sparsity in Language Models

Deng, Mingyang, Tao, Lucas, Benton, Joe

arXiv.org Artificial IntelligenceOct-13-2023

Recent works have proposed that activations in language models can be modelled as sparse linear combinations of vectors corresponding to features of input text. Under this assumption, these works aimed to reconstruct feature directions using sparse coding. We develop metrics to assess the success of these sparse coding techniques and test the validity of the linearity and sparsity assumptions. We show our metrics can predict the level of sparsity on synthetic sparse linear activations, and can distinguish between sparse linear data and several other distributions. We use our metrics to measure levels of sparsity in several language models. We find evidence that language model activations can be accurately modelled by sparse linear combinations of features, significantly more so than control datasets. We also show that model activations appear to be sparsest in the first and final layers.

activation, sparse, sparsity, (14 more...)

arXiv.org Artificial Intelligence

2310.07837

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Azure Machine Learning - Automobile Price Prediction Tutorial

#artificialintelligenceOct-28-2021, 18:05:09 GMT

In this article, we'll go through a hands-on experience to build a machine learning model to predict price of automobiles. The previous article explored about Azure Machine Learning and we went through a step-by-step process to create Machine Learning Workspace in Azure, creating the compute instances and compute cluster. This article builds up to the last article – designing a full-on machine learning project. We explore on developing a machine learning model for Automobile Price Prediction in this article. The machine learning workflow are explained and discussed in detail in the process.

azure machine learning, dataset, linear regression, (6 more...)

#artificialintelligence

Genre: Workflow (0.56)

Industry:

Transportation > Passenger (0.89)
Transportation > Ground > Road (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)

Add feedback